DiVA: Indexing high-dimensional data by "diving" into vector approximations

نویسندگان

  • Konstantinos Tsakalozos
  • Spiros Evangelatos
  • Alex Delis
چکیده

Contemporary multimedia, scientific and medical applications use indexing structures to access their highdimensional data. Yet, in sufficiently high-dimensional spaces, conventional tree-based access methods are eventually outperformed by simple serial scans. Vector quantization has been effectively used to index data that are mostly distributed uniformly. However, in real-world applications, clustered data and skewed query distributions are the norm. In this paper, we propose DiVA, an approach that selectively adapts the quantization step to accommodate varying indexing needs. This adaptation mechanism triggers the restructuring and possible expansion of DiVA so as to provide finer indexing granularity and enhanced access performance in certain “hot” areas of the search space. User-supplied policies help both identify such “hot” areas and satisfy versatile application requirements. Experimentation with our detailed prototype shows that in a real-world data set, DiVA yields up-to 64% reduced I/O compared to competing methods such as the VA-file and the A-tree.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DiVA: Using Application-Specific Policies to 'Dive' into Vector Approximations

In high-dimensional data domains, the performance of conventional tree-based access structures is occasionally outperformed by simple sequential scans. To this end, the introduction of approximation-based methods helped speed-up queries by providing compact representations of stored data. Approximation methods exploit vector quantization to index data mainly presumed to follow a uniform distrib...

متن کامل

یک روش مبتنی بر خوشه‌بندی سلسله‌مراتبی تقسیم‌کننده جهت شاخص‌گذاری اطلاعات تصویری

It is conventional to use multi-dimensional indexing structures to accelerate search operations in content-based image retrieval systems. Many efforts have been done in order to develop multi-dimensional indexing structures so far. In most practical applications of image retrieval, high-dimensional feature vectors are required, but current multi-dimensional indexing structures lose their effici...

متن کامل

Utilization of Principle Axis Analysis for Fast Nearest Neighbor Searches in High-Dimensional Image Databases

This paper presents an efficient indexing method for similarity searches in highdimensional image database by principal axis analysis. Image databases often represent the image objects as high-dimensional feature vectors and access them via the feature vectors and similarity measure. However, the performance of the existing nearest neighbor search methods is far from satisfactory for feature ve...

متن کامل

Vector Approximation based Indexing for High-Dimensional Multimedia Databases

the proliferation of multimedia data, there is an increasing need to support the indexing and searching of high-dimensional data. In this paper, we propose an efficient indexing method for high-dimensional multimedia databases using the filtering approach, known also as vector approximation approach which supports the nearest neighbor search efficiently. Our technique called RA +-Blocks (Region...

متن کامل

The Nondeterministic Divide

The noadeterministic divide partitions a vector into two nonempty slices by allowing the point of division to be chosen nondeterministically. Support for high-level divide-and-conquer programming provided by the nondeterministic divide is investigated. A diva algorithm is a recursive divide-andconquer sequential algorithm on one or more vectors of the same range, whose division point for a new ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011